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Abstract 

Background: Inter-individual epigenetic variation, due to genetic, environmental or random influences, is observed 
in many eukaryotic species. In mammals, however, the molecular nature of epiallelic variation has been poorly 
defined, partly due to the restricted focus on DNA methylation. Here we report the first genome-scale investigation 
of mammalian epialleles that integrates genomic, methylomic, transcriptomic and histone state information. 

Results: First, in a small sample set, we demonstrate that non-genetically determined inter-individual differentially 
methylated regions (iiDMRs) can be temporally stable over at least 2 years. Then, we show that iiDMRs are 
associated with changes in chromatin state as measured by inter-individual differences in histone variant H2A.Z 
levels. However, the correlation of promoter iiDMRs with gene expression is negligible and not improved by 
integrating H2A.Z information. We find that most promoter epialleles, whether genetically or non-genetically 
determined, are associated with low levels of transcriptional activity, depleted for housekeeping genes, and either 
depleted for H3K4me3/enriched for H3K27me3 or lacking both these marks in human embryonic stem cells. The 
preferential enrichment of iiDMRs at regions of relative transcriptional inactivity validates in a larger independent 
cohort, and is reminiscent of observations previously made for promoters that undergo hypermethylation in 
various cancers, in vitro cell culture and ageing. 

Conclusions: Our work identifies potential key features of epiallelic variation in humans, including temporal 
stability of non-genetically determined epialleles, and concomitant perturbations of chromatin state. Furthermore, 
our work suggests a novel mechanistic link among inter-individual epialleles observed in the context of normal 
variation, cancer and ageing. 
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Background 

Epialleles are genomic loci at which the epigenetic state 
can stably vary among individuals in a given population 
[1]. Although first described and still best understood in 
plants [2-4], in recent years we have come to realise that 
epigenomic landscapes in mammals can also show con- 
siderable inter-individual variation (reviewed in [1,5]). 
Mammalian epialleles could arise through the action of 
cis- or trans-genetic influences [6,7], or have non-genetic 
origins as a result of: (1) potential stochastic events [8,9]; 
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(2) exposure to a compromised in utero environment as 
has been shown in rodent and human studies [10-12]; (3) 
or adult life-style associated factors such as smoking [13]. 
Despite these and other previous studies, the molecular 
nature of mammalian epialleles, in particular those 
induced by non-genetic factors, has remained controver- 
sial [14]. To a large extent, this is due to the DNA methy- 
lation-focus of previous investigations [5]. Incorporation 
of information about the chromatin state would refine 
our understanding of the molecular nature and ultimately 
functionality of epialleles in the context of normal varia- 
tion or disease states. 

Here we describe the first systematic interrogation of 
mammalian epialleles that integrates genomic, methylo- 
mic, transcriptomic and chromatin state information. 
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Using a combination of experimental and computational 
analyses we identify key features of epiallelic variation in 
humans, including demonstrating that even non-geneti- 
cally determined epialleles can be temporally stable, and 
that DNA methylation variability at epialleles is associated 
with concomitant perturbations in chromatin state. Most 
notably, we find that promoter-associated epiallelic varia- 
tion is predominantly associated with developmentally 
important and/or tissue-restricted genes. A similar cate- 
gory of genes is preferentially hyper-methylated in various 
cancers, in vitro cellular transformation, and human 
chronological ageing, potentially pointing to a novel 
mechanistic link between these processes and human 
inter-individual epiallelic variation. 

Results 

Comprehensive genomic/epigenomic/transcriptomic 
profiling of monozygotic (MZ) twin pairs 

For the initial discovery phase of the study, we originally 
aimed to generate integrated genomic/epigenomic/tran- 
scriptomic profiles for CD 14+ cells from three different 
healthy MZ twin pairs of European ancestry (MZ pair 
31/32: female, age at sampling 27 years; MZ pair 21/22: 
female, age at sampling 27 years; MZ pair, male, 11/12, 
age at sampling 19 years Additional file 1, Table S2). We 
focussed on CD 14+ cells as they can be obtained to >90% 
purity [17] and Additional file 1, Figure SI, and are less 
likely to harbor post-differentiation, random epigenetic 
alterations as they have a lifespan of only a few weeks. 
Genetic profiles were obtained using the Illumina 
Omni2.5S array that interrogates approximately 2.5 mil- 
lion single nucleotide polymorphisms (SNPs) with a 
minor allele frequency of down to 1%. DNA methylation 
was assayed by Illumina450K arrays that provide bisulfite 
conversion-based, single base resolution methylation 
measurements at approximately 450,000 different cyto- 
sines associated with a range of genomic features such as 
promoters, enhancers, and CpG islands (CGIs) [15]. 
Gene expression was profiled using the standard Illumina 
mRNA-seq protocol. For the analysis of chromatin state 
we performed ChlP-seq on the histone variant H2A.Z, 
which is strongly associated with transcriptional activity 
(but can also be found at transcriptionally silent promo- 
ters), and is thought to be environmentally responsive 
[16]. We obtained 60-80 million mapped 36 bp paired 
end reads for the ChlP-seq and RNA-seq libraries. All 
genomic/epigenomic/transcriptomic profiles were gener- 
ated from the same single sampling of CD 14+ cells for 
each individual. Unfortunately, during the course of pro- 
cessing the samples, the DNA sample that was to be used 
for subsequent DNA methylation analysis for individual 
'12' was inadvertently lost. As these twins were recruited 
because we had methylomic data from a previous time- 
point to test for temporal stability, as discussed below, it 



was not feasible to repeat all the functional genomic 
assays on additional MZ pairs. 

Identification of temporally stable inter-individual DMRs 
(iiDMRs) 

An initial low-level analysis confirmed anticipated geno- 
mic/epigenomic correlations. First, we confirmed known 
functional correlations between DNA methylation, H2A.Z 
levels, and gene expression: promoter DNA methylation 
was negatively correlated with both H2A.Z and gene 
expression levels, and promoter H2A.Z levels were strongly 
positively correlated with gene expression (Figure 1A). 
Second, analysis of the SNP arrays did not reveal intra-MZ 
pair SNP or copy number variation (approximately 640,000 
to 660,000 SNPs observed between unrelated individuals, 
and 104 and 115 SNPs observed in 31/32 and 21/22 MZ 
pairs, respectively, and these are most likely false positives). 
Finally, DNA methylation profiles were substantially more 
similar between co-twins compared with unrelated indivi- 
duals (Figure IB). 

We then called inter-individual DMRs (iiDMRs) 
between all five possible pair-wise comparisons (two dif- 
ferent MZ pairs and one unpaired' co-twin). Illumina450K 
probe sequences were remapped using BLAT, and we only 
used probes that mapped exactly once. Furthermore, we 
ignored probes on chrX and Y, and also those that over- 
lapped known SNPs (based on information provided in 
the Illumina 450K annotation file) to eliminate artefacts 
due to differential probe hybridization effects, resulting in 
a final set of 369,908 different probes. Exclusion of these 
probes eliminates only ds-acting genetic variants present 
in the 50 bp sequence covered by the probe, and all other 
cis- and trans-genetic effects are retained. We considered 
only those iiDMRs with a directionally consistent >5% 
absolute methylation difference at >2 adjacent CpGs 
within 500 bp of each other, as the biological relevance of 
very small methylation differences limited to single CpG 
sites is currently unclear. The number of iiDMRs found in 
each of the five different pairwise comparisons is shown in 
Table 1. Consistent with Kaminsky et al [8], iiDMRs were 
found across the genome, but enriched at non-promoter 
non-exonic regions, and a greater number of iiDMRs were 
observed between unrelated individuals (Figure 2A). 
Furthermore, we generated triplicate Illumina 450K pro- 
files for CD 14+ cells from two additional MZ twin pairs, 
different to those described above, and found that the 
amount of biological variation seen between twins 
exceeded the technical variation (false positive iiDMR 
calls) by a factor of approximately 5 (P <10" 7 , permutation 
test) (Figure 2B). 

The epigenomic landscape for any given individual is, 
to an extent, constantly in flux, and many iiDMRs iden- 
tified from single time-point measurements probably do 
not have long-term phenotypic consequences. Therefore 
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31/32 21/22 31/21 31/11 11/22 



Figure 1 Epigenomic, genomic and transcriptomic profiling in CD14+ cells from MZ twins. (A) We analysed the following correlations at 
promoter regions: DNA methylation vs. gene expression, H2A.Z vs. gene expression and DNA methylation vs. H2A.Z. Gene expression levels are 
represented as log-transformed aligned average FPKM (fragments per kilobase of exon per million fragments mapped) from RNA-seq data. FPKM 
values were determined by TopHat and Cufflinks and assigned to TSSs. DNA methylation values were determined as (3 values using lllumina 
450K array (see Materials and Methods) and are represented here as percentage of methylation bins. Probes on the lllumina 450K array were 
assigned to their closest TSS, discarding assignments >1 kb away. H2A.Z abundance is represented as normalised read counts from ChlP-seq 
data correlates positively with gene expression (Spearman's p = 0.50, P <10" 3 ), DNA methylation is anti-correlated with H2A.Z (p = -0.59, P <10" 3 ). 
Promoters were defined as TSS ± 1,000 bp. (B) Correlation coefficients of inter-individual DNA methylation differences. 31/32 and 21/22 plots 
correspond to MZ twin iiDMRs; 31/21, 31/11 and 11/22 correspond to iiDMRs found in unrelated individuals. 
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Table 1 Number of iiDMRs identified in the various 
pairwise comparisons. 



Comparison 



iiDMRs {n) 



31 vs. 32 


5,077 


21 vs. 22 


2,528 


31 vs. 21 


9,512 


31 vs. 11 


9,062 


1 1 vs. 22 


10,526 



Number of iiDMRs for each comparison was calculated as DNA methylation 
differences >5% at >2 adjacent CpGs within 500 bp of each other. 



obtaining some measure of iiDMR temporal stability is 
critical, especially for non-genetically determined 
iiDMRs. The CD14+ cells used in this study were 
obtained from individuals recruited in late 2010. We 
previously sampled the same individuals in mid-2008 as 
part of a separate study in which we profiled their CD 14 
+ cells using the Illumina27K array [17]. This array con- 
tains probes for 27,578 different CpG sites, largely pro- 
moter-associated, and >90% of these CpGs are also 
represented on the Illumina450K array. For maximum 




Methylation% difference [current] Methylation% difference [current] Methylation% difference [current] Methylation% difference [current] 

Figure 2 Genomic characteristics and temporal stability of inter-individual DMRs (iiDMRs). (A) This plot shows the distribution of iiDMRs 
across different genomic features (promoters, regulatory regions (that is, Regulatory Features from Ensembl), exons, introns and all other regions 
(that is, other). iiDMRs were defined as differences >5% in DNA methylation between individuals present in two adjacent probes of the lllumina 
450K array. The feint line shows the expected level if iiDMRs were found equally in all regions of the genome. (B) A comparison of technical vs. 
biological variation for iiDMRs. Shown is the number of iiDMR calls between lllumina 450K array for technical replicates, twin pairs and unrelated 
individuals. This analysis was done with CD14+ DNA from two MZ twin pairs in triplicate. The biological variation significantly exceeded the 
technical variation {P <10" 7 , permutation test). (C) Temporal stability of iiDMRs. The inter-individual DNA methylation differences found in this 
study were compared with the DNA methylation differences present in the same individuals 2 years prior. The previous DNA methylation 
differences were obtained from CD14+ DNA methylation profiles from the same individuals using lllumina 27K data as part of a separate study 
[17]. Shown are all current and historic inter-individual methylation differences from all the common probes present on lllumina 27K and 
lllumina 450K arrays. 
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power when comparing to the much less densely spaced 
probes on the Illumina27K array, we considered all >5% 
methylation differences, rather than just those in multi- 
ple-probe iiDMRs. Comparison of the 2010 with 2008 
methylomic profiles, for CpG sites common to both 
platforms, revealed that a substantial proportion of the 
epigenetic variation seen in the 2010 samplings were also 
found in the 2008 samplings, that is, was temporally stable 
(Figure 2C). Although this is not surprising for compari- 
sons between unrelated individuals, since many of the 
methylation differences in such cases most likely represent 
stable genetic effects, temporal stability in the intra-MZ 
pair comparisons is particularly noteworthy as it means 
that random or environmental events can induce perma- 
nent, or at least semi-permanent, epiallelic variation. 
Furthermore, given that the lifespan of CD 14+ cells is only 
a few weeks, compared with the approximately 2-year gap 
between the first and second samplings, these epialleles 
most probably arise in blood progenitor cells. For the 
CpGs in common between the second (450K array) and 
first time-points (27K arrays), the proportion of methyla- 
tion differences found at the second time-point that were 
also present at the first time-point is shown in Additional 
file 1, Figure S2. The number of sites where a difference in 
the same direction is seen at both sites significantly 
exceeds what would be expected by chance in all cases 
(P <3 xlO" 13 , % 2 test). This is the first genome-scale 
demonstration of temporally stable epiallelic variation in 
mammals that cannot potentially be explained by stable 
genetic effects (for example, [18]). For further analyses we 
did not restrict ourselves to CpG sites common to the Illu- 
mina450K and 27K arrays, otherwise we would have been 
left with too few DMRs for any meaningful analyses, but it 
is reasonable to assume that the same degree of temporal 
stability should be present across the entire set of iiDMRs 
found from the Illumina450K data obtained from the 2010 
samplings. 

iiDMRs at promoters anti-correlate with H2A.Z and 
preferentially associate with lowly expressed 
or silent genes 

We next wanted to investigate whether iiDMRs are asso- 
ciated with altered chromatin states or gene expression, 
focussing on promoter-associated iiDMRs as it is currently 
not straightforward to correlate the activity of gene distal 
regulatory elements with gene expression levels. Promoters 
were defined as a 1 kb window centred at the annotated 
transcriptional start site (TSS). We observed a statistically 
significant anti- correlation between DNA methylation and 
H2A.Z at both intra- and inter-MZ pair promoter iiDMRs 
(Figure 3A and Additional file 1: Figure S3). This was espe- 
cially marked for iiDMRs associated with >5% absolute 
methylation differences. Significantly, the negative H2A.Z- 
DNA methylation correlation was observed even at sites 



>1 kb away from the promoter (Figure 3A). Therefore, our 
data show that a significant number of iiDMRs, even those 
that are non-genetically determined, represent genomic 
sites that harbour perturbed chromatin states, not just 
DNA methylation differences, and thus can be classified as 
epialleles [19]. 

Interestingly, we observed only a very weak anti-corre- 
lation between promoter iiDMRs and RNA-seq gene 
expression levels (Figure 3B). This was true for both 
intra- and inter-MZ pair iiDMRs, and not significantly 
improved even when considering just those iiDMRs that 
were anti-correlated with H2A.Z, or iiDMRs with a 
large magnitude, or those found within CpG islands or 
CpG shores (Figure 3B). Not surprisingly, a direct com- 
parison between intra-/inter-MZ pair H2A.Z variation 
and expression did not reveal any significant correla- 
tions either (Additional file 1, Figure S4). 

The relationship between promoter DNA methylation 
and gene expression is known to be complex, even in 
contexts where large DNA methylation differences are 
generally observed, for example, genetically-encoded dif- 
ferentiation programs or cancer [5]. In our case, we 
found it surprising that even inclusion of H2A.Z informa- 
tion did not improve the strength of the correlations as, 
theoretically, integration of information from different 
components of the 'epigenetic' state of a region should 
yield more robust correlations. To further explore the 
cause for these observations, we compared mean expres- 
sion levels of iiDMR promoter-associated genes with all 
genes in our dataset. We found promoter iiDMR-asso- 
ciated genes to be expressed at significantly lower levels, 
relative to the other genes, in all intra- and inter-MZ pair 
comparisons (Figure 3C). In other words, it seems that 
promoter-iiDMRs are associated with genes that are 
lowly expressed or silent in CD 14+ cells. 

These observations were reminiscent of previous results 
and our own report on human ageing-associated DNA 
methylation dynamics [20,21]. In those studies, human 
promoters that become hypermethylated with chronologi- 
cal age (aDMRs) were also associated with genes expressed 
at relatively low levels in the analysed tissue. Notably, 
these aDMRs were strongly enriched for promoters har- 
bouring bivalent chromatin domains in embryonic stem 
(ES) cells. Bivalent domains harbour both H3K4me3, gen- 
erally considered an active mark, and H3K27me3, gener- 
ally considered an inactive mark [22,23]. Furthermore, 
bivalent domain promoters are associated with develop- 
mentally important and tissue-restricted genes [22,23]. 
An analysis of previously published H3K4me3 and 
H3K27me3 ChlP-seq profiles in human ES cells [24] 
revealed that promoter iiDMRs were only moderately 
enriched for bivalent chromatin domains (relative to gen- 
ome average, refer to Figure 3D). Surprisingly though, 
strong statistically significant enrichment was observed for 
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Figure 3 Promoter iiDMRs anti-correlate with H2A.Z changes and are preferentially found at genes expressed at low levels. (A) Anti- 
correlation between inter-individual DNA methylation (for only >5% methylation differences) and H2A.Z differences in CD14+ cells. Data are 
shown as mean and 95% credible intervals. (B) Inter-individual promoter epiallelic differences are not correlated with gene expression 
differences. For each bin of inter-individual DNA methylation differences we calculated the ratio of log transformed RNA-seq reads (FPKM) in the 
first individual of each comparison respect to the other. Data are represented as 95% credible intervals on the mean. The left panel shows the 
correlations using DNA methylation data only, the middle panel uses those epialleles in which DNA methylation and H2AZ are anti-correlated, 
and the right panel looks at epiallelic variation at CpG shores (as defined in the lllumina450K array annotation file). (C) iiDMRs are predominantly 
found at genes expressed at low levels. Shown is the distribution RNA-seq reads (expressed as FPKM) for iiDMRs and non-iiDMRs. In all cases the 
curves corresponding to DMRs are shifted to the left compared to non-DMRs showing that DMRs occur preferentially at genes expressed at low 
levels. For all cases 31/32 and 21/22 plots correspond to MZ twin comparisons; 31/21, 31/11 and 11/22 correspond to unrelated individuals 
comparisons. (D) Promoter iiDMRs occur preferentially at regions depleted for H3K4me3 and enriched for H3K27me3 (K4lo/K27hi) or depleted for 
both these marks (K4lo/K27lo) in human embryonic stem (hES) cells. hES cell H3K4me3 and H3K27me3 ChlP-seq data are from [24]. The dashed 
line in each plot refers to the overall iiDMR fraction against the whole dataset. (E) lllumina450K probes were ranked by methylation-ageing 
correlation using data from [20], and the 500 most- age-associated probes were taken as aDMRs, while 500 randomly selected probes from the 
remainder of the dataset were taken as controls. For each set, we collected absolute methylation differences from each of the five possible 
pairings of individuals in this study, and plot 95% credible intervals on the mean. 



the high H3K27me3/low H3K4me3, or low H3K27me3/ 
low H3K4me3 states in ES cells. Both these chromatin 
states are also strongly associated with tissue-restricted 
genes and indeed we found iiDMR promoters were signifi- 
cantly less likely to be associated with house keeping 
genes (P < 10~ 5 in all five possible pair-wise comparisons, 
Chi-squared test). Re-analysis of the 500 most age-corre- 
lated probes (that is, aDMRs) from [20] revealed that these 
probes display significantly more intra- and inter-MZ pair 
variability than 500 randomly selected probes (P < 0.01, 
bootstrapped for all five comparisons, Figure 3E). So the 



common property between iiDMR and aDMR promoters 
seems to be a strong association with genes that are tis- 
sue-restricted, but are only moderately active, or inactive, 
in the analysed tissue. 

Analysis of iiDMRs in an independent cohort 

To independently validate the findings above, we re-ana- 
lysed DNA methylomic data from our previously pub- 
lished human ageing study, that is, the test set [20]. This 
larger and independent dataset consists of Illumina27K 
methylome profiles of whole blood obtained from 30 
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different healthy female MZ pairs of European ancestry, 
ranging in age from 25 to 79 years old [20]. Although the 
450K array overall has approximately 15X as many probes 
as the 27K, the majority of these are outside of promoter 
regions. Of the total annotated protein-coding genes in 
the human genome (21,665), 19,409 (89.6%) are associated 
with at least one promoter probe on the 450K array, 
whereas 14,400 (66.5%) are associated with at least one 
promoter probe on the 27K array. We calculated root 
mean square (RMS) differences for each probe on the Illu- 
mina27K array across the 30 different MZ pairs for both 
intra- and inter-MZ pair comparisons. The RMS deviation 
is a measure of the inter-individual methylation variability, 
and is directly proportional to the level of variability 
observed in the cohort under study. RMS difference, as 
opposed to mean difference, was used because the differ- 
ences will be in an arbitrary direction for each pair. It is 
important to note that the RMS difference is not a mea- 
sure of directional age-related changes. Analysis of the test 
set resulted in several key conclusions, applicable to both 
intra-MZ pair and inter-MZ pair comparisons, validating 
our findings from the discovery set. First, iiDMR-asso- 
ciated promoters found in the test set were associated 
with genes expressed at significantly lower levels com- 
pared with the non-iiDMR set of promoters in the CD 14+ 
RNA-seq data generated in our study, and in a previously 
published whole blood array-based expression dataset 
(Figure 4A and Additional file 1, Figure S5). Second, the 
test-set iiDMRs were significantly depleted for housekeep- 
ing genes (P <10" 5 for either intra-MZ pair or inter-MZ 
pair comparisons, Chi-squared test). Third, iiDMRs identi- 
fied in the test set were significantly enriched at promoters 
that harbour either high levels of H3K27me3 and low 
levels of H3K4me3, or are devoid of both of these marks 
in ES cells (Figure 4B). The fact that our findings from 
CD 14+ cells in discovery set were validated by unsorted 
whole blood cell data in the test set further supports the 
robustness of our key conclusions. We also determined 
that for all five possible comparisons in the discovery data- 
set, there's a strong correlation of both intra- and inter- 
MZ pair variability between the discovery and test datasets 
(Figure 4C). Finally, we also asked if there is an overlap 
between intra- and inter-MZ pair iiDMRs. This was done 
using the test set since it has significantly more pairs. 
We found that mostly the same probes show the highest 
variability in both intra- and inter-MZ pair comparisons 
(Figure 4D). Interestingly, inter-MZ pair comparisons show 
bigger differences at the same sites relative to intra-pair 
comparisons but the great majority of these show greater 
variability in the between-pair comparisons (Figure 4D). 

Discussion 

Our data reveal several novel and important features of 
mammalian epialleles. First, we find that even non- 



genetically determined epialleles can be temporally stable 
(at least over the course of 2 years). That is, a significant 
fraction of these epialleles are not just transient epige- 
netic perturbations with little prospect of influencing 
molecular function. Second, inter-individual DNA 
methylation variants are associated with perturbations of 
chromatin state, a relationship observed for even small 
differences, for example, down to approximately 5% 
methylation difference, and therefore can be considered 
as bona fide epigenetic perturbations. Of course, future 
studies using bigger sample numbers are needed to 
further explore our initial findings. 

The most significant aspect of our study is the finding 
that the correlation of iiDMRS with gene expression differ- 
ences is very weak and that iiDMRs are preferentially 
found in regions of relative transcriptional inactivity. So 
what are the implications of this? First, it is possible that 
some promoter epialleles show inter-related DNA methy- 
lation and chromatin state perturbations, but may not 
impact significantly on genome function, at least as mea- 
sured by steady state transcriptional activity. In the case of 
non-genetically determined epialleles, maybe all promoters 
are potentially subject to epiallelic variation, but the more 
active ones are 'cleared' of aberrant epigenomic variants, 
whereas the less active/silent promoters can accumulate 
epigenetic variation. But the enrichment of epialleles in 
less active/silent promoters was also found in comparisons 
between unrelated individuals. Although it is hard to say 
what proportion of epialleles between unrelated indivi- 
duals are due to genetic as opposed to environmental dif- 
ferences from our data, the genetic influence on DNA 
methylation profiles is well documented [3,4,25]. Bell and 
colleagues measured genome-wide methylation in 77 Hap- 
Map Yoruba individuals, for which gene expression and 
genotype data were available, and found a strong genetic 
component to inter-individual variation in DNA methyla- 
tion profiles [4]. Although they found a significant enrich- 
ment of SNPs that affect both methylation and gene 
expression, they also noted that the total number of genes 
showing such a signal is only a small proportion of the 
total number of methylation variants they identified [4]. A 
similar conclusion was reached by Myers and colleagues 
who analysed genome-wide methylation in six members of 
a three generation family and found that only 22% of 
genes harbouring genotype-dependent DNA methylation 
exhibited allele-specific gene expression (albeit more than 
expected by chance) [25] . Therefore, in both cases the cor- 
relation between genetically determined DNA methylation 
and expression is at best modest, which would be consis- 
tent with our results regarding chromatin state. 

It is possible that epiallelic variation acts in a manner 
not evident from simple correlations with steady-state 
expression levels in a given tissue. First it is possible that 
these correlations are tissue-restricted as has recently 
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Figure 4 Validation of iiDMRs in an independent cohort. (A) For this analysis we used 30 MZ twin pairs from [20] whose whole blood DNA 
methylation profiles were generated by lllumina 27K arrays. We defined low and high expression based on the RNA-seq data we generated in this 
study from CD14+ cells. High expression: >1 FPKM, Low expression: <1 FPKM. (B) Promoter iiDMRs in the test set are preferentially enriched at 
regions low in H3K4me3 and high in H3K27me3, and in regions that lack both of these marks in hES cells. The analysis was performed essentially as 
described in the legend for Figure 3D. (C) Intra- and inter-MZ pair iiDMRs are correlated in the discovery and validation cohorts. For the 30 MZ pairs 
from [20], we measured intra-pair methylation variability by taking the RMS (root-mean-square) methylation differences for each probe. We also 
calculated a similar inter-pair variability measure by permuting the pairs. For each of the five possible pairs in the current study, we bin probes by 
methylation difference, and plot the mean inter- and intra-pair validation methylation variabilities. (D) For the 30 MZ pairs from [20], we calculated 
intra-pair and inter-pair RMS methylation variability as above and show a scatter plot for all probes on the lllumina27K array (with exclusions as 
described in the Methods for the Hlumina450K data). 



been shown for genetically determined tissue-restricted 
gene expression [26]. Alternatively, conclusions from two 
recent studies, although not focusing on DNA methyla- 
tion/chromatin state in mammals, hint at other potential 
mechanisms by which epialleles could act. Yvert and col- 
leagues recently compared H3K14 acetylation profiles 
between two strains of the yeast Saccharomyces cerevi- 
siae, and found 5,442 sites that significantly differed in 
H3K14ac levels, which they called single nucleosome epi- 
polymorphisms (SNEPs) [27]. However, higher acetyla- 
tion in one strain did not always mean higher expression 
of the relevant gene, for example, in one case the SNEP 
was associated with the strength of gene activation upon 
stimulation by heat shock. Secondly, Lindgren and collea- 
gues recently assessed the effect of naturally occurring 
variation in miRNA expression levels on mRNA levels in 
humans, but found little correlation [28]. The authors 
concluded that their findings were more consistent with 
the primary role of miRNAs being to buffer mRNA 
levels. A key conclusion therefore is that correlating 
epialleles with steady-state RNA dynamics, possibly the 
most common analysis currently presented in papers on 
epiallelic investigations, may not be particularly fruitful. 

Finally, and potentially most importantly, the broadly 
similar characteristics of iiDMRs and aDMRs (from our 
previous study [20] and [29]) may in fact be a general fea- 
ture of mammalian epiallelic variation in a variety of con- 
texts. Meissner and colleagues found that aberrant 



gradual hyper-methylation during in vitro cell culture is 
found at promoters associated with genes not expressed 
in that cell type [30]. Additionally, it has been found in a 
variety of human cancers that bivalent chromatin 
domains (associated with low transcriptional activity in 
stem cells) are preferential targets of hyper-methylation 
[31-33]. The common thread among these seemingly dis- 
parate examples of inter-individual epigenetic variation is 
promoters that are developmentally regulated and tissue- 
restricted, and are only moderately active, or inactive, in 
the analysed tissue. We propose that there could be a 
potentially important mechanistic link between normal/ 
stochastic epiallelic variation and the epigenetic perturba- 
tions observed in the context of cancer and ageing. 

Conclusions 

The existence of mammalian epialleles is not in doubt, 
but the key challenge now is to characterise epialleles at 
the molecular level. Our work reveals key and novel 
properties of epiallelic variation in humans, and further 
suggests important mechanistic links between normal 
inter-individual epigenetic variation and epigenetic per- 
turbations observed in cancer and chronological ageing. 

Materials and methods 

Samples 

Fresh venous blood was obtained from three pairs (six 
individuals) of healthy MZ twins (Additional file 1, 
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Table SI). Blood was diluted 1:1 in RPMI media and 
then peripheral blood mononuclear cells (PBMCs) were 
separated by Ficoll-Hypaque gradient centrifugation. 
CD 14+ cells were isolated according to manufacturer's 
instruction using magnetic bead-based positive selection 
system (Miltenyi Biotech). The purity of the cells was 
determined by FACS using CD14-FITC antibodies 
(Additional file 1, Figure SI). All subjects gave informed 
consent and the study was approved by the Northern 
and Yorkshire Research Ethics committee (REC Refer- 
ence Number: 06-MREO-3-22). Validation of iiDMRs 
was done using whole blood Illumina27K data pre- 
viously generated [20]. This cohort included 30 different 
healthy MZ female twin pairs recruited from within the 
UK as part of the TwinsUK registry. 

Illumina 450K array 

A total of 500 ng of DNA from CD 14+ cells isolated using 
QIAamp DNA Mini Kit was bisulfite converted using the 
EZ DNA Methylation kit (Zymo Research). Arrays were 
processed at the Barts and The London Genome Centre, 
London, UK according to the manufacturers recommen- 
dations. Methylation scores for each CpG site are called as 
'Beta values (using BeadStudio software from Illumina), 
that range from 0 (unmethylated (U)) to 1 (fully methy- 
lated (M)) on a continuous scale, and are calculated from 
the intensity of the M and U alleles as the ratio of fluores- 
cent signals. 

Illumina Omni2.5S arrays 

The arrays were processed according to the manufac- 
turer's instructions using 500 ng of DNA. 

ChIP- and RNA-seq 

The chromatin immunoprecipitation (ChIP) assay was 
performed on 5 x 10 5 CD14+ cells according to pre- 
viously published protocols with minor modifications 
[34] . Chromatin was sonicated to get fragments of 100 to 
500 bp and immunoprecipitated with 10 uL of anti-H2A. 
Z antibody (Active Motif, Cat no: 39113). ChlP-seq 
libraries were prepared following the Illumina protocol 
and ligated to standard PE adaptors and sequenced on an 
Illumina GAIIx instrument. For RNA-seq, 200 ng of total 
RNA was used to prepare RNA-seq libraries using the 
TruSeq RNA kit from Illumina following the instructions 
provided in the suppliers manual, and sequenced on an 
Illumina GAIIx instrument. 

Sequence data processing 

ChlP-seq reads were mapped to the GRCh37 (hgl9) 
reference genome sequence using MAQ 0.6.6 and map- 
pings with quality scores <10 were discarded. For 
iiDMR-centric analyses, we counted the numbered of 
paired end fragments overlapping each probe region on 



the Illumina array and used that as a ChIP score for 
that probe. RNA-seq reads were mapped to the refer- 
ence genome using Tophat 1.3.1, then expression levels 
(FPKM) were estimated for each Ensembl transcript 
using Cufflinks 1.0.3. For analyses comparing methyla- 
tion data to expression, methylation array probes lying 
within 1 kb of an Ensembl TSS were assigned an 
'expression level' equal to that of the transcript asso- 
ciated with the nearest TSS. 

Statistical analyses 

Correlation between variables was performed using 
Spearman's rank test. Confidence intervals for all box/bar 
plots are obtained by bootstrapping unless otherwise sta- 
ted. Confidence intervals for the hES cell H3K4me3/ 
H3K27me3 bar charts are estimated from a binomial 
model. Probes associated with housekeeping genes were 
defined as in [20]. 

For the genomic location enrichment analyses, exon, 
intron and regulatory features were extracted from 
Ensembl, and promoters were defined as regions within 
1 kb of the TSSs of an Ensembl gene. For each of these 
categories, we asked what fraction of the probes lying in 
the selected regions were called as iiDMRs, and plot 
95% confidence intervals on this proportion, estimated 
using a binomial model. For comparison, the feint line 
indicates the fraction of iiDMRs across the whole data- 
set, allowing enrichment or depletion to be assessed. 

Accession codes 

All data are available on GEO [GSE46220]. 
Additional material 



Additional file 1: Additional Materials and methods, Tables SI and 
S2, and Figures S1 to S5. 
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